System and Method for Performing Memory Operations In A Computing System

ABSTRACT

A processor may operate in one of a plurality of operating states. In a Normal operating state, the processor is not involved with a memory transaction. Upon receipt of a transaction instruction to access a memory location, the processor transitions to a Transaction operating state. In the Transaction operating state, the processor performs changes to a cache line and data associated with the memory location. While in the Transaction operating state, any changes to the data and the cache line are not visible to other processors in the computing system. These changes become visible upon the processor entering a Commit operating state in response to receipt of a commit instruction. After changes become visible, the processor returns to the Normal operating state. If an abort event occurs prior to receipt of the commit instruction, the processor transitions to an Abort operating state where any changes to the data and cache line are discarded.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. application Ser.No. 13/084,280, filed Apr. 11, 2011, which is a continuation applicationof U.S. application Ser. No. 12/168,689 filed Jul. 7, 2008, now U.S.Pat. No. 7,925,839, which is a continuation of U.S. application Ser. No.10/836,932 filed Apr. 30, 2004, now U.S. Pat. No. 7,398,359 which claimsthe benefit of U.S. Provisional Application No. 60/467,019 filed Apr.30, 2003, all of which are hereby incorporated by reference herein.

TECHNICAL FIELD OF THE INVENTION

The present invention relates in general to computer system processingand more particularly to a system and method for performing memoryoperations in a computing system.

BACKGROUND OF THE INVENTION

In computer systems, there is a disparity between processor cycle timeand memory access time. Since this disparity limits processorutilization, caches have been introduced to solve this problem. Caches,which are based on the principal of locality, provide a small amount ofextremely fast memory directly connected to a processor to avoid thedelay in accessing the main memory and reduce the bandwidth needed tothe main memory. Even though caches significantly improve systemperformance, a coherency problem occurs as a result of the main memorybeing updated with new data while the cache contains old data. Forshared multi-processor systems, a cache is almost a necessity sinceaccess latency to memory is further increased due to contention for thepath to the memory. It is not possible for the operating system toensure coherency since processors need to share data to run parallelprograms and processors cannot share a cache due to bandwidthconstraints.

Various algorithms and protocols have been developed to handle cachecoherency. For example, in a directory based caching structure, a writeinvalidate scheme allows for a processor to modify the data in itsassociated cache at a particular time and force the other processors toinvalidate that data in their respective caches. When a processor readsthe data previously modified by another processor, the modifyingprocessor is then forced to write the modified data back to the mainmemory. Though such a scheme handles cache coherency in theory,limitations in system performance are still apparent.

SUMMARY OF THE INVENTION

From the foregoing, it may be appreciated by those skilled in the artthat a need has arisen for an extended coherency protocol and an abilityto track access to memory locations involved in a transaction andprocessor state information. In accordance with the present invention,there is provided a system and method for performing memory operationsin a computing system that substantially eliminates or greatly reducesdisadvantages and problems associated with conventional coherencyprotocols.

According to an embodiment of the present invention, there is provided asystem for performing memory operations in a computing system thatincludes a processor that operates in one of a plurality of operatingstates. In a Normal operating state, the processor is not involved witha memory transaction. Upon execution of a transaction instruction toaccess a memory location, the processor transitions to a Transactionoperating state. In the Transaction operating state, the processorperforms changes to a cache line in a cache memory associated with thememory location to include changing from a MESI coherency protocol toone of a plurality of transactional coherency states associated with theTransaction operating state. While in the Transaction operating state,any changes to the data and the cache line are not visible to otherprocessors in the computing system. These changes become visible uponthe processor entering a Commit operating state in response to receiptof a commit instruction.

After changes become visible and the cache line is returned to the MESIcoherency protocol, the processor returns to the Normal operating state.If an abort event occurs prior to receipt of the commit instruction, theprocessor transitions to an Abort operating state where any changes tothe data and cache line are discarded. Upon discarding the changes, theprocessor transitions to a Suspended state and awaits receipt of acommit instruction before transitioning to the Normal operating state.

The present invention provides various technical advantages overconventional coherency protocols. For example, one technical advantageis to treat memory access and operations as transactions. Anothertechnical advantage is to provide a transaction record in the processorto track the state of the processor during memory transactions. Yetanother technical advantage is to integrate an extended cache coherencyprotocol with the transaction record of the processor. Embodiments ofthe present invention may include all, some, or none of these technicaladvantages while other technical advantages may be readily apparent tothose skilled in the art from the following figures, description, andclaims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and theadvantages thereof, reference is now made to the following descriptiontaken in conjunction with the accompanying drawings, wherein likereference numerals represent like parts, in which:

FIG. 1 illustrates a state diagram for a processor in a computingsystem;

FIG. 2 illustrates the implementation of a transaction record maintainedby the processor;

FIG. 3 illustrates the cache coherency state transitions due toinstruction execution.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 shows a state diagram of the transition states that are enteredinto by a processor during operation. The transition states includeNormal, Transaction, Commit, Abort, and Suspended. The Normal stateindicates that there is no active transaction to process. TheTransaction state indicates that a transaction is in progress. TheCommit state indicates that a transaction has successfully completed butthe transaction is in the process of being cleaned. The Abort stateindicates that a transaction has been aborted but the transaction isstill in the process of being cleaned. The Suspended state indicatesthat a transaction has been aborted and cleaned but the processor hasnot executed a Commit or Abort instruction.

In order to support transactions, the processor provides support fortracking access to memory locations involved in a transaction and stateinformation for recording the processor's transaction state. To tracktransaction states, each processor maintains a Transaction Record aswell as a mechanism (such as a pointer to a free list) to obtain memorylocations for storage of additional transaction state information. Inaddition, the primary data cache state field is expanded to include thestates of Invalid (I), Shared (S), Exclusive (E), Dirty (D), SharedTransactional (ST), Exclusive Transactional (ET), and DirtyTransactional (DT). Each cache tag also includes two added bits, TV andTVE, to indicate that transaction data formerly resided in that line andhas been evicted. The TV bit indicates that data was evicted from the STstate. The TVE bit indicates that data was evicted from the ET or DTstate. These bits are persistent through changes to the tag but arecleared when the transaction state is cleaned up during the Abort orCommit states.

FIG. 2 shows the implementation of a Transaction Record maintained bythe processor. The Transaction Record is a set of hardware registers inthe processor storing the following fields: TState {Normal, Transaction,Commit, Abort, Suspended}, WBPtr {pointer to WBRrecord), and EvictPtr{pointer to evicted shared addresses}. Other information may be includedto support additional functionality. When in the Normal state, theprocessor begins a transaction with the execution of any TransactionalMemory Reference instruction (see following section for description ofthese instructions). This causes transition 1 in the state diagram ofFIG. 1 and causes the processor to set the Transaction Record to theTransaction state. As long as the processor remains in the Normal state,it is not involved in a transaction and its actions obey theconventional coherency protocols.

Upon entering the Transaction state, the processor's behavior changes asit is now engaged in a transaction and, from that point until asuccessful Commit state, the processor will do nothing which will causethe state of memory visible to other processors in the system to change.The processor's cache is used to hold changes which it makes, and anydata which is evicted from the primary data cache is copied into aneviction list instead of being sent back to its normal memory location.Upon executing a Commit state, all changes to memory performed duringthe transaction are made globally visible. If, instead, the transactionaborts, the locations in the cache containing changes made during thetransaction and the evicted writebacks are discarded, restoring thestate of memory (as viewed by all processors) to what it was at thebeginning of the transaction.

While in the Transaction state, any transactional load instruction to anew address adds that address to the transaction's Read Set and anytransactional load exclusive or transactional store instruction addsthat address to the transaction's Write Set. Any attempt by anotherprocessor to write to an address in the Read Set, or to read or writefrom an address in the Write Set, will cause the current transaction toabort (transition 3 in the state diagram of FIG. 1). An abort will alsobe caused by any exception during the transaction or by the execution ofan Abort instruction. Certain simple exceptions may be permitted,especially Transaction Lookaside Buffer (TLB) misses (if these are stillhandled in software) to occur without causing an abort. An ABORTinstruction may be added at the beginning of the exception handlersinstead of doing the abort in hardware.

While in Transaction state, the processor's response to incomingcoherency (Invalidate, Update, and Intervention Requests) messages ismodified as follows: Invalidate and Update requests are processednormally, except that if the primary cache line it targets has a TV orTVE bit set, the coherency address is also checked against all addressesin the Evicted or Writeback list, respectively. If both bits are set,both lists will be checked. If the coherency address matches any addressin one of these lists, or if it hits a line in the ST, ET, or DT states,the transaction aborts (see below for details of the abort operation).Intervention requests that match the tag of a line in the DT state willbe processed as if the line were in the ET state—the processor respondswith a message indicating that the contents of memory should be used. Ifthe TVE bit for the line is set, the Intervention address is alsochecked against the Writeback list. If the Intervention address matchesa tag or a list address, the transaction aborts.

Other than an abort, the only other way to exit the Transaction state isthe execution of a Commit instruction, which causes the transactionstate machine to go to the Commit state (transition 2 in the statediagram of FIG. 1). Upon execution of a COMMIT instruction while inTransaction state, the processor enters the Commit state. In this state,all changes to memory performed during the committed transaction aremade visible to the rest of the system. To accomplish this, thefollowing actions are performed:

-   -   The Evicted Address list is discarded and the tokens in the list        are attached to the end of the free list. The Evict Pointer is        set to null.    -   All writebacks stored in the Writeback list are converted to        WEack messages and written to their home node. All tokens in the        Writeback list are attached to the end of the free list. The        Writeback Pointer is set to null. The L2 cache is invalidated at        the address of the writeback if that address is currently stored        in the L2 cache.    -   All TV and TVE bits in the primary cache are set to zero.    -   All cache lines in the ST state transition to the S state. All        cache lines in the ET state transition to the E state. All cache        lines in the DT state transition to the D state. Upon completion        of the above actions, the processor transitions to the Normal        state (transition 4 in the state diagram of FIG. 1).

While in the Commit state, incoming Intervention, Invalidate, and Updaterequests are held until the processor exits this state. It may befeasible to handle these requests in this state as a performanceoptimization by taking the actions needed to produce the same result aswould occur after the Commit state is complete. Any transactional memoryreference instruction that is issued stalls until the processor exitsthe Commit state. Commit and Abort instructions are treated as nooperation instructions (NOPs) if executed when the processor is not inthe Transaction state. In some implementations, these instructions trapif an attempt is made to execute them when already in the Commit state.

When in the Transaction state, the following situations will cause atransition to the Abort state (transition 3 in the state diagram of FIG.1), aborting the current transaction:

-   -   Execution of an Abort instruction.    -   The processor takes an exception.    -   An Invalidate or Update Request is received whose address        matches any cache line that is part of the Read Set.    -   An Intervention is received whose address matches any cacheline        that is part of the Write Set.

Upon execution of an abort instruction, the processor enters the Abortstate. In this state, all changes to memory performed during the abortedtransaction are discarded, restoring the state of the contents of theWrite Set to its state prior to the start of the transaction. Toaccomplish this, the following actions are performed:

-   -   Eliminate messages may be sent to the directory for all        addresses in the Evicted Address list (this is a performance        optimization which is optional). The Evicted Address list is        discarded and the tokens in the list are attached to the end of        the free list. The Evict Pointer is set to null.    -   Eliminate messages may be sent to the directory for all        addresses in the Writeback list (this is a performance        optimization which is optional). All writebacks stored in the        Writeback list are discarded. All tokens in the Writeback list        are attached to the end of the free list. The Writeback Pointer        is set to null. The L2 cache is invalidated at the address of        the writeback if that address is currently stored in the L2        cache.    -   All TV and TVE bits in the primary cache are set to zero.    -   All cache lines in the ST state transition to the S state. All        cache lines in the ET state transition to the E state. All cache        lines in The DT state transition to the I state. Eliminate        messages may be sent to the directory for all cache lines        transitioned to the I state.

Upon completion of the above actions, the processor transitions to theSuspended state (transition 5 in the state diagram of FIG. 1) until aCommit instruction is executed (Commit instructions will stall ifdispatched while in the Abort state and execute as soon as thetransition to the Suspend state occurs).

While in the Abort state, incoming Intervention, Invalidate, and Updaterequests are held until the processor exits this state. It may befeasible to handle these requests in this state as a performanceoptimization by taking the actions needed to produce the same result aswould occur after the abort instruction is complete. Any transactionalmemory reference instruction that is issued stalls until the processorexits the Abort state.

The processor enters the Suspended state as soon as it completes thecleanup of the aborted transaction in the Abort state. While in theSuspended state, the processor executes as in the Normal state exceptthat all transactional memory reference instructions are treated asNOPs. Upon executing a Commit instruction, the processor transitions tothe Normal state, making it ready to begin another transaction.

The following new processor instructions are added:

TEST T (R)—Sets register R to a non-zero Reason Code (reason codes to bedefined) if the processor is currently in the Abort or Suspended states;sets R to zero otherwise. This instruction is used to test to seewhether the current transaction has been aborted to allow skipping theexecution of useless instructions.

ABORT—Aborts the current transaction—If the processor is in theTransaction state, sets the Transaction State to the Abort state therebyinitiating the actions described above. If the current transaction hasalready aborted, or the processor is in any state other than theTransaction state, this instruction acts as a NOP.

COMMIT (R)—Attempts to commit the current transaction—If the processoris in the Transaction state, sets the Transaction state to the Commitstate, performing the commit of the current transaction, as describedabove. If the current transaction has already aborted (the processorbeing in the Suspended state), the COMMIT instruction causes atransition to the Normal state. If the current state is the Abort state,the COMMIT instruction stalls until transaction cleanup completes andthe processor transitions to the Normal state. Register R is set to anon-zero Reason Code (reason codes to be defined) if the processor iscurrently in the Abort or Suspended states; R is set to zero otherwise.If executed while in the Normal or Commit states, a COMMIT instructionacts as a NOP or may cause an exception.

For the following group of Transactional Memory Reference instructions,if the processor's state is Normal, executing these sets the processorstate to Transaction. These instructions may be in single and doubleword, integer, and floating point forms.

LT (Load Transactional)—Performs a Load for read access only and addsthe referenced memory location to the Read Set of the currenttransaction. This instruction acts exactly like an ordinary Loadinstruction, except that it sets the cache state to the ST state insteadof the S state. If the cache is already in the S or E state, ittransitions to ST; if already in the D state it performs an ordinaryWriteback with Data Retained and transitions to ST. If the cache isalready in any *T state, the state remains unchanged.

LTX (Load Transactional Exclusive)—Performs a Load for write access andadds the referenced memory location to the Write Set of the currenttransaction. This instruction acts exactly like an ordinary Loadinstruction, except that it issues a read exclusive request to thedirectory and sets the cache state to the ET state instead of the Sstate. If the cache is already in the S, ST, or E states, it sends anUpgrade request to the directory and transitions to ET; if already inthe D state it performs an ordinary Writeback with Data Retained andtransitions to the ET state. If the cache is already in ET or DT state,the state remains unchanged. This instruction may replace a LLinstruction.

STX (Store Transactional)—Performs a Store and adds the referencedmemory location to the Write Set of the current transaction. Thisinstruction acts exactly like an ordinary Store instruction, except thatit sets the cache state to the DT state instead of the D state. If thecache is already in the S, ST, or E states, it sends an Upgrade requestto the directory and transitions to the DT state; if already in the Dstate it performs an ordinary Writeback with data retained andtransitions to the DT state; if already in the ET state, the cachetransitions to the DT state. If the cache is already in the DT state,the state remains unchanged.

FIG. 3 shows the cache state transitions due to instruction execution.The following shows the system behavior for the various cache statesunder the extended coherency model needed to support the functionsdescribed above.

Invalid (I)—Cache line is not in use and contains no valid data. Thedirectory may be in any state.

Shared (S)—Cache line contains a copy of data which is the same as thecontents of memory and the contents of other caches also in S or STstates. The directory will be in the S state and its sharing vector willpoint at this node.

Shared Transactional (ST)—Cache line contains a copy of data that is thesame as the contents of memory (and the same as the contents of othercaches also in the S or ST states). The collection of all cache lines inthe ST state plus all of the cache lines in the Eviction Listconstitutes the Read Set of a transaction. The directory will be in theS state and its sharing vector will point at this node. When a cacheline is in the ST state and the processor is in the Transaction state,an eviction of the line from the processor's cache will cause theevicted address to be added to the Eviction List and the TV bit for thatcache tag to be set.

Exclusive (E)—Cache line contains a copy of data that is the same as thecontents of memory. No other cache in the system contains a copy of thisdata and the processor may write to this line without performing anycoherency transactions. The directory will be in the E state and itspointer will point at this node.

Exclusive Transactional (ET)—Cache line contains a copy of data that isthe same as the contents of memory. No other cache in the systemcontains a copy of this data and the processor may write to this linewithout performing any coherency transactions. The directory will be inthe E state and its pointer will point at this node. When a cache lineis in the ET state and the processor is in the Transaction state, aneviction of the line from the processor's cache will cause the evictedaddress to be added to the Writeback List and the TVE bit for that cachetag to be set.

Dirty (D)—Cache line contains modified data that is different from thecontents of memory. No other cache in the system contains a copy of thisdata and the processor may write to this line without performing anycoherency transactions. The directory will be in the E state and itspointer will point at this node.

Dirty Transactional (DT)—Cache line contains modified data that isdifferent from the contents of memory. The directory will be in the Estate and its pointer will point at this node. When a cache line is inDT state and the processor is in the Transaction state, an eviction ofthe line from the processor's cache will cause the evicted address anddata to be added to the Writeback List and the TVE bit for that cachetag to be set.

In summary, the state of the processor during memory transactions ismaintained in a transaction record of the processor. The coherencyprotocol for the cache lines is extended to include additional states.By providing support for memory transactions along with an expandedcache state implementation, an improved cache coherency protocol isachieved. The processing discussed above may be incorporated entirely incomputer software code, on a computer readable medium, or beincorporated into a combine software/hardware implementation.

One of the advantages provided by the present invention is that thecache coherency protocol does not need to be changed. Moreover, thedirectory structures are unchanged on the memory modules. Anotherimportant advantage is that the footprint of a transaction is notlimited by the size of the cache within a processor module. A sequenceof instructions can be treated as a single transaction that is eitheratomically executed with respect to other sequences of instructions oris not executed. The number of distinct memory locations referenced byan instruction sequence as a single transaction, in a system having aprocessor module with a processor and a cache, is not limited by thesize of the cache.

Thus, it is apparent that there has been provided, in accordance withthe present invention, a system and method for performing memoryoperations in a computing system that satisfies the advantages set forthabove. Although the present invention has been described in detail, itshould be understood that various changes, substitutions, andalterations may be readily ascertainable by those skilled in the art andmay be made herein without departing from the spirit and scope of thepresent invention as defined by the following claims. Moreover, thepresent invention is not intended to be limited in any way by anystatement made herein that is not otherwise reflected in the appendedclaims.

What is claimed is:
 1. A method of performing memory operations in a computing system, comprising: transitioning a cache line associated with a memory location from a conventional coherency protocol to one of a plurality of extended coherency protocol states associated with an operating state of a processor; performing an update to the cache line associated with the memory location in accordance with the operating state of the processor, the update to the cache line not being visible to other processors in the computing system; and tracking access to a memory location by identifying the cache line with the extended coherency protocol state according to the update performed.
 2. The method of claim 1, wherein the conventional coherency protocol includes a MESI coherency protocol.
 3. The method of claim 1, wherein the plurality of extended coherency protocol states is associated with a Transaction operating state of the processor.
 4. The method of claim 1, wherein the plurality of extended coherency protocol states includes a Shared Transactional state characterized by the cache line having a copy of data that is the same as the corresponding contents of the memory and one or more other cache lines also in a Shared Transactional state.
 5. The method of claim 4, when the cache line is in the Shared Transactional state and in response to an eviction of an address from the cache line, further comprising: adding the evicted address to an Eviction List; and setting one of two cache tag constituent elements.
 6. The method of claim 1, wherein the plurality of extended coherency protocol states include an Exclusive state characterized by the cache line having an exclusive copy of data that is the same as the corresponding contents of the memory, such that no other cache has a copy of said data.
 7. The method of claim 6, when the cache line is in the Exclusive state, further comprising writing to the cache line without performing a coherency transaction.
 8. The method of claim 1, wherein the plurality of extended coherency protocol states include an Exclusive Transactional state characterized by the cache line having an exclusive copy of data that is the same as the corresponding contents of the memory, such that no other cache has a copy of said data.
 9. The method of claim 8, when the cache line is in the Exclusive Transactional state and in response to an eviction of an address from the cache line, further comprising: adding the evicted address to a Writeback List; and setting one of two cache tag constituent elements.
 10. The method of claim 8, further comprising writing to the cache line without performing a coherency transaction when the cache line is in the Exclusive Transactional state.
 11. The method of claim 1, wherein the plurality of extended coherency protocol states include a Dirty state characterized by the cache line having modified data that is different from the corresponding contents of the memory, and wherein no other cache has a copy of the modified data.
 12. The method of claim 11, further comprising writing to the cache line without performing a coherency transaction when the cache line is in the Exclusive Transactional state.
 13. The method of claim 1, wherein the plurality of extended coherency protocol states include a Dirty Transactional state characterized by the cache line having modified data that is different from the corresponding contents of the memory.
 14. The method of claim 13, when the cache line is in the Dirty Transactional state and in response to an eviction of an address from the cache line, further comprising: adding the evicted address and data to a Writeback List; and setting one of two cache tag constituent elements. 